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Patterns of deliberate human activity and behavior are of utmost 
importance in areas as diverse as disease spread, resource al- 
location, and emergency response. Because of its widespread 
availability and use, e-mail correspondence provides an attractive 
proxy for studying human activity. Recently, it was reported that 
the probability density for the inter-event time r between consec- 
utively sent e-mails decays asymptotically as - ", with a a 1. 
The slower than exponential decay of the inter-event time distri- 
bution suggests that deliberate human activity is inherently non- 
Poissonian. Here, we demonstrate that the approximate power- 
law scaling of the inter-event time distribution is a consequence 
of circadian and weekly cycles of human activity. We propose 
a cascading non-homogeneous Poisson process which explicitly 
integrates these periodic patterns in activity with an individual's 
tendency to continue participating in an activity. Using standard 
statistical techniques, we show that our model is consistent with 
the empirical data. Our findings may also provide insight into the 
origins of heavy-tailed distributions in other complex systems. 

human activity | point process | hypothesis testing | complex systems 

The analysis of social and economic data has a long and illustrious 
history [ 1 , 2 , 3 1 . Despite their idiosyncratic complexity, a number 
of striking statistical regularities are known to describe individual and 
societal human behavior [4 5 6 7|. These regularities are of enor- 
mous practical importance because they provide insight into how in- 
dividual behaviors influence social and economic outcomes. Indeed, 
much of the current research on complex systems aims to quantify 
the impact of individual agents on the organization and dynamics of 
the system as a whole |8, 9|. Before we can predict how individuals 
affect, for example, the organization of systems, it is paramount to 
understand the behavior of the individual agents. 

The current availability of digital records has made it much eas- 
ier for researchers to quantitatively investigate various aspects of hu- 
man behavior (TOj [H] [I2j [131 [I3J [H] [Jg] [13 [HJ [l9j [20J ED In 
particular, e-mail communication records are attracting much atten- 
tion as a proxy for quantifying deliberate human behavior due to 
the omnipresence of e-mail communication and availability of e-mail 
records I13II14| [T8 16]. The data, however, does not provide a de- 
tailed record of all of the activities in which each individual partici- 
pates; we do not know, for instance, when an individual is sleeping, 
eating, walking, or even browsing the web. The resulting uncertainty 
in deliberate human activity thus poses a fundamental challenge to 
quantifying and modeling human behavior. 

Researchers commonly account for uncertainty or lack of infor- 
mation through stochastic models. One of the simplest stochastic 
models for human activity is a point process in which independent 
events occur at a constant rate p. Such processes are referred to 
as homogeneous Poisson processes, and they are used to describe 
a large class of phenomena, including some aspects of human ac- 
tivity 1221 . Homogeneous Poisson processes have two well-known 
statistical properties: the time between consecutive events, the inter- 



event time r, follows an exponential distribution, p(r) = pe~ pT , and 
the number of events Nt during a time interval of duration T time 
units follows a Poisson distribution with mean pT . 

Several recent studies of deliberate human activity, including e- 
mail correspondence, have focused on the former property. These 
studies have reported that the empirical distribution of inter-event 
times decays asymptotically as a power-law, p(r) oc r~ a , with ex- 
ponent a ~ 1 [23 , 13 , 14 18]. Other studies have identified a similar 
power-law scaling in the inter-event time distribution of many other 
facets of human behavior, such as file downloads 1 101 1 1 llfTZ il. letter 
correspondence 1 131 1171 [TBI , library usage |17|, broker trades 1171 . 
web browsing [17, 19], human locomotor activity [20], and telephone 
communication |21|. These observations are in stark contrast with 
the predictions of a homogeneous Poisson process, suggesting that a 
more suitable null model with which to compare mechanistic mod- 
els of human activity is a truncated power-law model with scaling 
exponent a = 10 

The heavy-tailed nature of the distribution of inter-event times 
prompts us to search for the mechanisms responsible for its emer- 
gence. Two main classes of mechanisms can be considered: (i) hu- 
man behavior is primarily driven by rational decision making, which 
introduces correlations in activity thereby giving rise to heavy- tails [j] 
(ii) human behavior is primarily driven by external factors such as 
circadian and weekly cycles, which introduces a set of distinct char- 
acteristic time scales thereby giving rise to heavy- tails [f] While the 
former interpretation has been shown to give rise to a truncated power- 
law distribution of inter-event times, the latter has been rejected by 
some authors 1 17 1. Indeed, even though Hidalgo [24| investigated a 
model with seasonal changes in activity rates that is able to gener- 
ate data with an approximate power-law decay in the distribution of 
inter-event times with exponents a ~ 2 or a ~ 1, the a « 1 case 
requires a specific relationship between the rates of activity pi and the 
corresponding duration of the seasons Ti over which each rate holds. 
It has therefore been argued that seasonality alone can only robustly 
give rise to heavy-tailed inter-event time distributions with exponent 
a w 2 fPTl . 

Here, we demonstrate that the distribution of inter-event times 
in e-mail correspondence patterns display systematic deviations from 
the truncated power-law null model due to circadian and weekly pat- 
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* For simplicity we use a truncated power-law with an exponent of a — 1 as our null model. 
Similar conclusions are reached when the power-law scaling exponent is fit to the data or when 
other heavy-tailed null models (e.g. log-normal or log-uniform distributions 1161 ) are considered, 
t If humans make decisions based on their own previous memories, then we might expect that 
humans are heavily influenced by recent events. That is, the probability p dt that an event will 
happen in a time interval dt is not constant, but is instead a decreasing function of the time 
elapsed since the last event [181 . 

* This interpretation does not rely on highly competent human behavior and allows for the pos- 
sibility that human activity, and hence the time dependence of p, is modulated by instinct, the 
environment, or social stimuli. 
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terns of activity. We subsequently propose a mechanistic model that 
incorporates these observed cycles, and a novel simulated annealing 
procedure to nonparametrically estimate its parameters. We then use 
Monte Carlo hypothesis testing to demonstrate that the predictions of 
our model are consistent with the observed heavy-tailed inter-event 
time distribution. Finally, we discuss the implications of our findings 
on modeling human activity patterns and, more generally, complex 
systems. 

Empirical patterns 

We study a database of e-mail records for 3,188 e-mail accounts at a 
European university over an 83-day period |23|. Each record com- 
prises a sender identifier, a recipient identifier, the size of the e-mail, 
and a time stamp with a precision of one second. We preprocess the 
data set and identify a set of 394 accounts which provide enough data 
to quantify human activity and which are likely neither spammers nor 
listservs (see 5/ Sec. SI). 

In order to gain some intuition about e-mail activity patterns, let 
us consider a fictitious student, Katie0 Katie arrives at the university 
20 minutes before her Thursday morning class. During this time, she 
decides to check her e-mail and sends three e-mails. Katie checks her 
e-mail after lunch and sends a brief e-mail to a friend before her next 
class. Later that evening, Katie sends four more e-mails once she has 
finished her homework. Katie does not check her e-mail again un- 
til the following day when she sends e-mails intermittently between 
attending classes, completing homework assignments, and meeting 
social engagements. Katie spends the weekend without e-mail access 
and doesn't send another e-mail until Monday. Katie's e-mail activity, 
which is similar to many e-mail users, is both periodic and cascading. 
That is, there are periodic changes in her activity rate, which account 
for her sleep and work patterns, and there are cascades of activity — 
active intervals — of varying length when Katie primarily focuses on 
e-mail correspondence (Fig. [T). 

If our intuition about deliberate human activity is correct, then the 
periodic patterns of activity should manifest itself in the inter-event 
time statistics, particularly when compared with the predictions of the 
truncated power-law null model which does not account for temporal 
periodicities (see 5/ Sec. S4). Specifically, we anticipate that e-mail 
users typically send e-mails during the same 8-hour periods of the 
day. We therefore expect the data to have significantly more inter- 
event times between 24 ± 8 hours — the time required to send e-mails 
on consecutive workdays — than the truncated power-law model pre- 
dictions. We therefore expect that the null model underestimates the 
number of inter-event times between 16 and 32 hours. Due to the nor- 
malization of the probability density, the truncated power-law model 
will over-estimate other inter-event times. These predictions are all 
confirmed by the data, suggesting that periodicity is a fundamental 
aspect of human activity (Fig. |2j. 



Model 

We propose a model of e-mail usage that incorporates the hypothe- 
sized periodic and cascading features of human activity. We account 
for periodic activity with a primary process, which we model as a non- 
homogeneous Poisson process. Whereas a homogeneous Poisson pro- 
cess has a constant rate p, a non-homogeneous Poisson process has a 
rate p(t) that depends on time. In our model, the rate p(t) depends on 
time in a periodic manner; that is, p(t) = p(t + W), where W is the 
period of the process. Consistent with our observations (Fig. [3j, we 
relate the rate of the non-homogeneous Poisson process to the daily 



and weekly distributions of active interval initiation, Pd(t) and p w (t): 

p(t) = N wPd (t)p w (t), [1] 

where the period W is one week and the proportionality constant N w 
is the average number of active intervals per week0 

We further assume that each event generated from the primary 
process initiates a secondary process, which we model as a homoge- 
neous Poisson process with rate p a . We refer to these "cascades of 
activity" as active intervals, during which N a additional events oc- 
cur where N a is drawn from some distribution p(N a ). Once the iV a 
events have occurred in the active interval, the activity of the indi- 
vidual is again governed by the primary process defined by Eq. fll , 
Our model thus mimics how individuals like Katie use e-mail: Katie 
sends e-mails sporadically throughout the day, but once she starts 
checking her e-mail, it is relatively easy to send additional e-mails 
in rapid succession. We refer to the resulting model as a cascading 
non-homogeneous Poisson process[3 

Results 

To compare our model with the empirical data, we first need to es- 
timate the parameters of our model from the data. Ideally, the data 
would specify which events belong to the same active intervals — the 
active interval configuration C — so that we could estimate the dis- 
tributions Pd(t), Pw(t), and p(N a ). The data we analyze, however, 
does not specify the actual active interval configuration C so it is not 
evident whether, for example, p(N a ) should be described by a normal 
or exponential distribution. 

Because we do not know a priori the functional form of the ac- 
tivity pattern in the cascading process, we cannot use the formalism 
implemented by, for example, Scott and co-workers I34II35I . Instead, 
we introduce a new method that enables us to nonparametrically infer 
the empirical distributions Pd(t), Pw{t), and p(N a ) from the data. 

Given a particular active interval configuration C, we can easily 
calculate all of our model's parameters and compare it's predictions 
with the empirical data: N w is the average number of active intervals 
per week; pdit) and p w (t) are the probabilities of starting an active 
interval at a particular time of day and week respectively; the active 
interval rate p a is the inverse of the average inter-event time in active 
intervals; and the probability of iV a additional events occurring during 
an active interval p(N a ) is estimated directly from the active interval 
configuration (Fig. [3). We then manipulate the active interval config- 
uration C to find the active interval configuration C that gives a best 
estimate of the observed inter-event time distribution (see Methods). 
This method allows us to infer the best-estimate distributions Pd{t), 
p w (t), and p(N a ) given the data and our proposed model without 
making any assumptions on their functional forms. 

We next compare the predictions of the cascading non- 
homogeneous Poisson process with the empirical cumulative distri- 
bution of inter-event times P(r) for all 394 users under consideration 
in the present study (see 5/ Fig. S7). Since we are using the empirical 
data to estimate the parameters for our model — that is, the estimated 
parameters depend on the data — we must use Monte Carlo hypothesis 



§ We suspect that most users only had access to their e-mail at the university since the data 
are obtained from a European university prior to 2004 l25l . 

^ In specifying N w as the average number of active intervals per week, we are implicitly as- 
suming that the fraction of time spent in active intervals is very small. We have verified that 
this is the case for all users under consideration. Also, it is important to choose the time step 
At in the binning of the empirical p d (t) to be sufficiently small such that the probability of an 
event occurring at time t is p(t) At <c 1- We choose At — 1 /N w hours, which meets this 
criterion while still maintaining computational feasibility. 

' Our model is similar in spirit to the Neyman-Scott cascading point process 26 27] and the 
Hawkes self-exciting process 28' , except that in our model (0 the primary process is modulated 
periodically by a non-homogeneous rate, and (//} the active intervals are non-overlapping. 
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testing 1 29 30] to assess the significance of the agreement between 
the predictions of our model and the empirical data (see SI Sec. S3). 
The visual agreement of our model's predictions are confirmed by 
p-values clearly above our 5% rejection threshold (Fig. |4j. 

In fact, the cascading non-homogeneous Poisson process can only 
be rejected at the 5% significance level for one user, indicating that 
our model can not be rejected as a model of human dynamics. By 
comparison, the truncated power-law null model is rejected at the 5% 
significance level for 344 users. Indeed, the null model is always 
rejected for many more users than the cascading non-homogeneous 
Poisson process regardless of the rejection threshold selected and our 
model displays none of the systematic deviations from the data ob- 
served for the truncated power-law null model (Fig. [5} 



Discussion 

Our results clearly demonstrate that circadian and weekly cycles, 
when coupled to cascading activity, can accurately describe the heavy- 
tails observed in email communication patterns. The question then is, 
would rational decision making together with circadian and weekly 
cycles be equally able to describe the statistical patterns observed for 
e-mail communication? Even if the answer to this question is af- 
firmative, parsimony suggests that rational decision making is not a 
necessary component of human activity patterns, given our simpler 
explanation. 

In addition to providing a good description of e-mail communi- 
cation patterns, we surmise that our model is readily applicable to 
many other conscious human activities. For instance, most people 
make telephone calls sporadically throughout the day. After a tele- 
phone call has been made, it is effortless to make another telephone 
call. Similarly, individuals run errands throughout the month. Once 
an individual runs one errand, it is easier to run another errand during 
the same trip than it is to run errands again the following day. Both 
of these anecdotes are illustrative of the way humans tend to optimize 
their time and effort to accomplish the tasks in their daily routines, a 
process that is captured by the periodic and cascading mechanisms in 
our model. 

The particular periodic and cascading features that are incorpo- 
rated into our model depend on the activity under consideration. For 
instance, sexual activity is influenced by menstrual cycles [31] and air- 
line travel is influenced by seasonality |32|. Furthermore, our model 
can also be generalized to cases in which the parameters are not sta- 
tionary. This may be important, for instance, in the case of Darwin 
and Einstein's letter correspondence in which the number of letters 
sent per year increases 100-fold over 40 years 1 1511 1 81 . 

Although our model is only designed to account for a single activ- 
ity (e-mail correspondence), it can easily be extended to incorporate 
the multitude of activities in which any individual participates. To 
facilitate the inclusion of additional activities, it is useful to interpret 
our model as a non- stationary hidden Markov point process II33II34I . 
Within this framework, an individual switches between any two activ- 
ities i and j with some probability defined by a non-stationary Markov 
transition matrix Ty (t) which depends on time t. For instance, our 
model can be redefined as a non-stationary hidden Markov point pro- 
cess which switches between two states: a state in which an individual 
is not composing e-mails and a state in which an individual is com- 
posing e-mails. Predictions of models that incorporate more than one 
activity can then be verified against data that records several activities 
for a single individual. 

Our model further suggests a novel experiment |36| which not 
only records when an individual has sent an e-mail, but also when 
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that individual is using a computer or actively utilizing an e-mail 
client. This additional data would provide direct empirical evidence 
for describing active intervals. In the absence of such data, we have 
developed a simulated annealing procedure which allows us to non- 
parametrically infer the hidden Markov structure of our model, pro- 
viding insight into how to compare our model with other cascading 
point processes 1261 1271 . 

While our model provides an accurate description of when an e- 
mail is sent, a question left unaddressed is to determine whom the 
probable recipient of that e-mail is going to be. For instance, one 
might speculate that e-mails are sent randomly with some Poissonian 
rate to acquaintances or individuals which share common interests. 
Alternatively, it is plausible that e-mails are sent based on a perceived 
priority of important tasks, perhaps in response to previous corre- 
spondence 1141 . When combined with our model that statistically 
describes when individuals send e-mails, quantifying the likely recip- 
ient of an e-mail will provide an important step toward describing how 
the structure of e-mail and social networks evolve. 

Our study also provides a clear demonstration of how hypothe- 
sis testing [29 37] can objectively assess the validity of a proposed 
model — a procedure we vehemently advocate. Using this method- 
ology, we demonstrate that while both models reproduce the asymp- 
totic scaling of the observed inter-event time distribution, our model 
is consistent with the entire inter-event time distribution whereas the 
truncated power-law null model is not. 

The consequences of our findings are clear; demonstrating that a 
model reproduces the asymptotic power-law scaling of a distribution 
does not necessarily provide evidence that the model is an accurate 
mechanistic description of the underlying process. Indeed, there is 
mounting evidence that some purported power-law distributions in 
complex systems may not be power-laws at all |38 39. 40 1. There 
may be a common explanation for these apparent power-laws: com- 
plex systems are inherently hierarchical but the distinct levels in the 
hierarchy are difficult to distinguish 1411 . In the case of e-mail cor- 
respondence for example, the active intervals are not recorded in the 
data data, thereby concealing the various scales of e-mail activity. 
This demonstrates how the mixture of scales of activity can give rise 
to scale-free activity patterns. We suspect that similar mixture-of- 
scales explanations 1421 1431 [44 45 46 1 may provide a basis for the 
reported universality of heavy-tailed distributions in complex systems. 

Methods 

Area test Statistic. We quantify the agreement between a model 
M{9) with parameters 6 and data set T> by measuring the area A be- 
tween the empirical cumulative distribution function P-p(it) and the 
model cumulative distribution function Pm i u \d)' 

A = J '\Pd(u) - P M {u\0)\du. [2] 

We specify u — In r, which is roughly uniformly distributed, to im- 
prove the numerical efficiency of our simulated annealing procedure. 
The area test statistic is advantageous as it is easy to interpret and it 
retains more information about the distribution than many other test 
statistics (see 5/ Sec. S2). 

Identifying active intervals. If we know the actual active inter- 
val configuration C D , it would be straightforward to compute the pa- 
rameters o = {N w , pd{t) , p w (t) , p a ,p(N a )} of the cascading non- 
homogeneous Poisson process. The data, however, does not iden- 
tify the actual active interval configuration C D , we must use heuristic 
methods (see SI Sec. S5) to determine the best-estimate active in- 
terval configuration C, from which we can compute the best-estimate 
parameters 8. We use simulated annealing to minimize the area test 

PNAS | Issue Date | Volume | Issue Number | 3 



statistic A (Eq. Q) for the inter-event time distribution. Thus, identi- 
fying active intervals that are consistent with our expectations for our 
model reduces to finding the best-estimate active interval configura- 
tion C which minimizes the area A between the empirical data and 
the predictions of the cascading non-homogeneous Poisson process. 

Our simulated annealing procedure is as follows. Starting from 
a random active interval configuration C in which adjacent events are 
randomly assigned to the same active interval, we compute the pa- 
rameters 8 of the cascading non-homogeneous Poisson process, then 
we numerically estimate the cumulative distribution Pm(u\6), and 
finally we measure the area test statistic A(C) of the active interval 
configuration C. The active interval configuration is modified to a 
new configuration C' by either merging two adjacent active intervals 
or by splitting an active interval. If the new configuration C' reduces 
the area test statistic, then the new configuration is unconditionally 
accepted. Otherwise the configuration is conditionally accepted with 
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probability exp(- (A(C) — A(C) )/T), where T is the effective "tem- 
perature" measured in units of the area test statistic A. After attempt- 
ing 2N configurations at each temperature so that each pair of N 
consecutive events might be merged and split, we reduce the tem- 
perature T by 5% until the active interval configuration settles at the 
best-estimate C without moving for 5 consecutive cooling stages!"! 
We have verified that our simulated annealing procedure accurately 
identifies active intervals and estimates parameters 9 in synthetically 
generated cascading non-homogeneous Poisson process data sets (see 
SI Sec. S5 and Fig. S5). 
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Fig. 1 . Example of a periodic and cascading stochastic process. A, Expected probability of starting an active interval during a particular day of the weekp w (t). We 
depict two weeks to emphasize that this pattern is periodic and that every week is statistically identical to every other week. We surmise that e-mail users are more 
likely to send e-mails on the same days of the week, a consequence of regular work schedules. B, Expected probability of starting an active interval during a particular 
time of the day Pd(t). Again, we depict 14 days to emphasize that this pattern is periodic and that every day is statistically identical to every other day. We surmise 
that e-mail users are more likely to send e-mails during the same times of the day, a consequence of circadian sleep patterns. C, The resulting activity rate p(t) for the 
non-homogeneous Poisson process. The activity rate p(t) is proportional to the product of the daily and weekly patterns of activity where the proportionality constant 
N w is the average number of active intervals per week (Eq. [T]). D, A time series of events generated by a non-homogeneous Poisson process. Each event in this 
time series initiates a cascade of additional events, an active interval. E, Schematic illustration of cascading activity. During cascades — active intervals — we expect 
that an individual will send N a additional e-mails according to a homogeneous Poisson process with rate p a . We denote the start of active intervals with a dashed 
line to signify that the activity is no longer governed by the non-homogeneous Poisson process rate p(t). Once the active interval concludes, e-mail usage is again 
governed by the periodic rate p(t). We refer to the collection of active intervals as the active interval configuration C throughout the manuscript. F, Observed time 
series. Since the data does not isolate intervals of activity, the observed time series is the superposition of both the non-homogeneous Poisson process time series 
and the active interval time series. 
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Fig. 2. Systematic deviations of the data from the truncated power-law null model due to periodic patterns of human activity. The vertical lines at r = 10 hours 
is meant as a guide to the eye. A-B, Comparison of truncated power-law model (red line) with empirical data (□) for Users 2650 and 467 from the data set [23]. 
Lines of best fit are estimated by minimizing the area test statistic (see 5/ Sec. S4). C-D, Log-residual, R = ln(p J vi(r|6')/p(r)) of the best-fit truncated power-law 
distribution model M. The shaded region denotes inter-event times where the null model underestimates the data. If the empirical inter-event time distribution were 
well-described by the truncated power-law null model, the log-residuals R would be small and normally distributed, particularly in the tail of the distribution. However, 
the log-residuals R have large systematic fluctuations in the tail of the inter-event time distribution (r > 0.25 hours) where the power-law scaling approximately 
holds. E, Conditional probability density p(R.\t) obtained for all 394 users under consideration. The average log-residual at each inter-event time is represented by 
the dashed line. Both the average log-residual and conditional probability density indicate that nearly all users under consideration systematically deviate from the 
truncated power-law null model, as anticipated from the arguments in the main text. 
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Fig. 3. Patterns of e-mail activity for four users in increasing order of e-mail usage (see 5/ Fig. S7 for the same analysis for all 394 users). These e-mail users 
exemplify the e-mail usage patterns that are typical of the users in the data set. We use simulated annealing to identify active intervals and calculate the parameters 
for the cascading non-homogeneous Poisson process (see Methods). The red distributions and text (A-B) correspond with the parameters for the primary process, 
a non-homogeneous Poisson process, while the blue distributions and text (C) correspond with the parameters for the secondary process, a homogeneous Poisson 
process. A-B, Active intervals are much more likely during weekdays rather than weekends and during the daytime rather than the nighttime. These prolonged periods 
of inactivity lead to the heavy tail in the inter-event time distribution. C, Small inter-event times, in contrast, are characterized by active intervals. One can interpret 
active intervals in several ways: larger p a may indicate that a user is a more proficient e-mail user; larger (N a )/p a may suggest that an individual has a larger attention 
span; N a /p a may be the time that an individual has to check e-mail before their next commitment. 
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Fig. 4. Comparison of the predictions of the cascading non-homogeneous Poisson process (red line) with the empirical cumulative distribution of inter-event times 
P(r) (black line) for the same users from Fig. [3](see SI Fig. S7 for the same analysis for all 394 users). We use the area test statistic A (Eq. [2]) and Monte Carlo 
hypothesis testing to calculate the p-value between the model and the data (see SI Sec. S3). As these figures are presented, the area test statistic A is the area 
between the two curves. Not only do the predictions of the cascading non-homogeneous Poisson process visually agree with the empirical data, but the p-values 
indicate that it can not be rejected as a model of e-mail activity at a conservative 5% significance level. 
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Fig. 5. A, Summary of the hypothesis testing results for the cascading non-homogeneous Poisson process and the truncated power-law null model for the 394 users 
under consideration. For each user, we compute the p-value between their inter-event time distribution and the predictions of each model (see SI Sec. S3). We reject 
a model for a particular user if the p-value is less than the 5% rejection threshold (gray shaded region). At this significance level, the cascading non-homogeneous 
Poisson process can be rejected for one user while the truncated power-law null model can be rejected for 344 users (see SI Sec. S4). Note that if the data were 
actually generated by one of the models tested, we would expect to see a uniform distribution of p-values (dashed line). Since this is very nearly the case for the 
cascading non-homogeneous Poisson process, this provides additional evidence that our model is consistent with the data. B, Conditional probability density p(R\r) 
obtained for all 394 users under consideration. The average log-residual at each inter-event time is represented by the dashed line. In contrast to the results in Fig.|2f , 
we find no systematic deviations between the model predictions and the data in the tail of the inter-event time distribution where the power-law scaling approximately 
holds. 
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